Subspace Embeddings and ℓp-Regression Using Exponential Random Variables

نویسندگان

  • David P. Woodruff
  • Qin Zhang
چکیده

Oblivious low-distortion subspace embeddings are a crucial building block for numerical linear algebra problems. We show for any real p, 1 ≤ p <∞, given a matrix M ∈ Rn×d with n d, with constant probability we can choose a matrix Π with max(1, n1−2/p)poly(d) rows and n columns so that simultaneously for all x ∈ R, ‖Mx‖p ≤ ‖ΠMx‖∞ ≤ poly(d)‖Mx‖p. Importantly, ΠM can be computed in the optimal O(nnz(M)) time, where nnz(M) is the number of non-zero entries of M . This generalizes all previous oblivious subspace embeddings which required p ∈ [1, 2] due to their use of p-stable random variables. Using our matrices Π, we also improve the best known distortion of oblivious subspace embeddings of `1 into `1 with Õ(d) target dimension in O(nnz(M)) time from Õ(d) to Õ(d), which can further be improved to Õ(d) log n if d = Ω(log n), answering a question of Meng and Mahoney (STOC, 2013). We apply our results to `p-regression, obtaining a (1+ )-approximation inO(nnz(M) log n)+poly(d/ ) time, improving the best known poly(d/ ) factors for every p ∈ [1,∞) \ {2}. If one is just interested in a poly(d) rather than a (1 + )-approximation to `p-regression, a corollary of our results is that for all p ∈ [1,∞) we can solve the `p-regression problem without using general convex programming, that is, since our subspace embeds into `∞ it suffices to solve a linear programming problem. Finally, we give the first protocols for the distributed `p-regression problem for every p ≥ 1 which are nearly optimal in communication and computation.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Subspace Embeddings and \(\ell_p\)-Regression Using Exponential Random Variables

Oblivious low-distortion subspace embeddings are a crucial building block for numerical linear algebra problems. We show for any real p, 1 ≤ p < ∞, given a matrix M ∈ R with n ≫ d, with constant probability we can choose a matrix Π with max(1, n)poly(d) rows and n columns so that simultaneously for all x ∈ R, ‖Mx‖p ≤ ‖ΠMx‖∞ ≤ poly(d)‖Mx‖p. Importantly, ΠM can be computed in the optimal O(nnz(M)...

متن کامل

Subspace Embeddings for the Polynomial Kernel

Sketching is a powerful dimensionality reduction tool for accelerating statistical learning algorithms. However, its applicability has been limited to a certain extent since the crucial ingredient, the so-called oblivious subspace embedding, can only be applied to data spaces with an explicit representation as the column span or row span of a matrix, while in many settings learning is done in a...

متن کامل

Experimental study for the comparison of classifier combination methods

In this paper, we compare the performances of classifier combination methods (bagging, modified random subspace method, classifier selection, parametric fusion) to logistic regression in consideration of various characteristics of input data. Four factors used to simulate the logistic model are: (a) combination function among input variables, (b) correlation between input variables, (c) varianc...

متن کامل

A Random Walk with Exponential Travel Times

Consider the random walk among N places with N(N - 1)/2 transports. We attach an exponential random variable Xij to each transport between places Pi and Pj and take these random variables mutually independent. If transports are possible or impossible independently with probability p and 1-p, respectively, then we give a lower bound for the distribution function of the smallest path at point log...

متن کامل

A Bernstein-type Inequality for Suprema of Random Processes with Applications to Model Selection in Non-gaussian Regression

Let (Xt)t∈T be a family of real-valued centered random variables indexed by a countable set T . In the first part of this paper, we establish exponential bounds for the deviation probabilities of the supremum Z = supt∈T Xt by using the generic chaining device introduced in Talagrand (1995). Compared to concentration-type inequalities, these bounds offer the advantage to hold under weaker condit...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1305.5580  شماره 

صفحات  -

تاریخ انتشار 2011